Position Heaps for Parameterized Strings

نویسندگان

  • Diptarama
  • Takashi Katsura
  • Yuhei Otomo
  • Kazuyuki Narisawa
  • Ayumi Shinohara
چکیده

We propose a new indexing structure for parameterized strings, called parameterized position heap. Parameterized position heap is applicable for parameterized pattern matching problem, where the pattern matches a substring of the text if there exists a bijective mapping from the symbols of the pattern to the symbols of the substring. We propose an online construction algorithm of parameterized position heap of a text and show that our algorithm runs in linear time with respect to the text size. We also show that by using parameterized position heap, we can find all occurrences of a pattern in the text in linear time with respect to the product of the pattern size and the alphabet size.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Position Heap of a Trie

The position heap is a text indexing structure for a single text string, recently proposed by Ehrenfeucht et al. [Position heaps: A simple and dynamic text indexing data structure, Journal of Discrete Algorithms, 9(1):100-121, 2011]. In this paper we introduce the position heap for a set of strings, and propose an efficient algorithm to construct the position heap for a set of strings which is ...

متن کامل

Constructing LZ78 Tries and Position Heaps in Linear Time for Large Alphabets

We present the first worst-case linear-time algorithm to compute the Lempel-Ziv 78 factorization of a given string over an integer alphabet. Our algorithm is based on nearest marked ancestor queries on the suffix tree of the given string. We also show that the same technique can be used to construct the position heap of a set of strings in worst-case linear time, when the set of strings is give...

متن کامل

Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance

As an aid in software maintenance, it would be useful to be able to track down duplication in large software systems efficiently. Duplication in code is often in the form of sections of code that are the same except for a systematic change of parameters such as identifiers and constants. To model such parameterized duplication in code, this paper introduces the notions of parameterized strings ...

متن کامل

Analysis of String Sorting using Heapsort

In this master thesis we analyze the complexity of sorting a set of strings. It was shown that the complexity of sorting strings can be naturally expressed in terms of the prefix trie induced by the set of strings. The model of computation takes into account symbol comparisons and not just comparisons between the strings. The analysis of upper and lower bounds for some classical algorithms such...

متن کامل

On the Longest Common Parameterized Subsequence

The well-known problem of the longest common subsequence (LCS), of two strings of lengths n and m respectively, is O(nm)-time solvable and is a classical distance measure for strings. Another well-studied string comparison measure is that of parameterized matching, where two equal-length strings are a parameterized-match if there exists a bijection on the alphabets such that one string matches ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017